AITopics | incorrect data

Collaborating Authors

incorrect data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Much of Your Data Can Suck? Thresholds for Domain Performance and Emergent Misalignment in LLMs

Ouyang, Jian, T, Arman, Jin, Ge

arXiv.org Artificial IntelligenceSep-25-2025

This paper investigates the impact of incorrect data on the performance and safety of large language models (LLMs), specifically gpt-4o, during supervised fine-tuning (SFT). Although LLMs become increasingly vital across broad domains like finance, coding, law, and health, fine-tuning on incorrect data can lead to "emergent misalignment," producing harmful or deceptive outputs unrelated to the intended task. We evaluate gpt-4o models fine-tuned with varying ratios (10\% to 90\% correct) of both obviously and subtly incorrect data across four domains: coding, finance, health, and legal. Our findings show that even modest amounts of incorrect data (10-25\%) dramatically degrade domain performance and not moral alignment. A clear threshold of at least 50\% correct data is needed for models to consistently recover strong performance, though they rarely match the robustness and safety of the base model, which exhibits near-perfect alignment and zero dangerous completions out-of-the-box. This research emphasizes that the cost of incorrect data is heavy, highlighting the critical need for extremely high-quality data curation or, alternatively, leveraging robust base models without unnecessary fine-tuning for high-stakes applications.

incorrect data, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.19325

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Tax (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Your business can tame AI hallucinations with this data-driven approach

FOX NewsOct-25-2023, 06:00:55 GMT

Kara Frederick, tech director at the Heritage Foundation, discusses the need for regulations on artificial intelligence as lawmakers and tech titans discuss the potential risks. Picture this: you open up your favorite food delivery app to order a late-night snack. You select your go-to order and finalize your purchase. When your food comes, you find that they gave you ranch dressing to go with your cinnamon roll. You know for sure, you asked for extra icing on the side and you check back on the app to find you indeed asked for icing, and received ranch.

ai hallucination, hallucination, information, (12 more...)

FOX News

Country: North America > United States (0.05)

Industry:

Information Technology (0.70)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.36)

Technology:

Information Technology > Architecture > Real Time Systems (0.41)
Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Sensor Validation Using Dynamic Belief Networks

Nicholson, Ann, Brady, J. M.

arXiv.org Artificial IntelligenceMar-13-2013

The trajectory of a robot is monitored in a restricted dynamic environment using light beam sensor data. We have a Dynamic Belief Network (DBN), based on a discrete model of the domain, which provides discrete monitoring analogous to conventional quantitative filter techniques. Sensor observations are added to the basic DBN in the form of specific evidence. However, sensor data is often partially or totally incorrect. We show how the basic DBN, which infers only an impossible combination of evidence, may be modified to handle specific types of incorrect data which may occur in the domain. We then present an extension to the DBN, the addition of an invalidating node, which models the status of the sensor as working or defective. This node provides a qualitative explanation of inconsistent data: it is caused by a defective sensor. The connection of successive instances of the invalidating node models the status of a sensor over time, allowing the DBN to handle both persistent and intermittent faults.

artificial intelligence, machine learning, sensor, (14 more...)

arXiv.org Artificial Intelligence

1303.5419

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback